Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material
نویسندگان
چکیده
Training for Our Volumetric CNNs To produce occupancy grids from meshes, the faces of a mesh are subdivided until the length of the longest edge is within a single voxel; then all voxels that intersect with a face are marked as occupied. For 3D resolution 10,30 and 60 we generate voxelizations with central regions 10, 24, 54 and padding 0, 3, 3 respectively. This voxelization is followed by a hole filling step that fills the holes inside the models as occupied voxels. To augment our training data with azimuth and elevation rotations, we generate 60 voxelizations for each model, with azimuth uniformly sampled from [0, 360] and elevation uniformly sampled from [−45, 45] (both in degrees). We use a Nesterov solver with learning rate 0.005 and weight decay 0.0005 for training. It takes around 6 hours to train on a K40 using Caffe [2] for the subvolume supervision CNN and 20 hours for the anisotropic probing CNN. For multi-orientation versions of them, SubvolumeSup splits at the last conv layer and AniProbing splits at the second last conv layer. Volumetric CNNs trained on single orientation inputs are then used to initialize their multiorientation version for fine tuning. During testing time, 20 orientations of a CAD model occupancy grid (equally distributed azimuth and uniformly sampled elevation from [−45, 45]) are input to MO-VCNN to make a class prediction.
منابع مشابه
Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study
Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...
متن کاملFusionNet: 3D Object Classification Using Multiple Data Representations
High-quality 3D object recognition is an important component of many vision and robotics systems. We tackle the object recognition problem using two data representations, to achieve leading results on the Princeton ModelNet challenge. The two representations: • Volumetric representation: the 3D object is discretized spatially as binary voxels 1 if the voxel is occupied and 0 otherwise. • Pixel ...
متن کامل3D Scene and Object Classification Based on Information Complexity of Depth Data
In this paper the problem of 3D scene and object classification from depth data is addressed. In contrast to high-dimensional feature-based representation, the depth data is described in a low dimensional space. In order to remedy the curse of dimensionality problem, the depth data is described by a sparse model over a learned dictionary. Exploiting the algorithmic information theory, a new def...
متن کاملWhat You Sketch Is What You Get: 3D Sketching using Multi-View Deep Volumetric Prediction
Sketch-based modeling strives to bring the ease and immediacy of drawing to the 3D world. However, while drawings are easy for humans to create, they are very challenging for computers to interpret due to their sparsity and ambiguity. We propose a data-driven approach that tackles this challenge by learning to reconstruct 3D shapes from one or more drawings. At the core of our approach is a dee...
متن کاملA deep learning approach for pose estimation from volumetric OCT data.
Tracking the pose of instruments is a central problem in image-guided surgery. For microscopic scenarios, optical coherence tomography (OCT) is increasingly used as an imaging modality. OCT is suitable for accurate pose estimation due to its micrometer range resolution and volumetric field of view. However, OCT image processing is challenging due to speckle noise and reflection artifacts in add...
متن کامل